{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 层与块"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 研究讨论“比单个层大”但“比整个模型小”的组件更有价值。\n",
    "* 当我们使用softmax回归时，一个单层本身就是模型。\n",
    "* 对于多层感知机而言，整个模型及其组成层都是这种架构。 整个模型接受原始输入（特征），生成输出（预测）， 并包含一些参数（所有组成层的参数集合）。 同样，每个单独的层接收输入（由前一层提供）， 生成输出（到下一层的输入），并且具有一组可调参数， 这些参数根据从下一层反向传播的信号进行更新。\n",
    "* 块（block）可以描述单个层、由多个层组成的组件或整个模型本身。 使用块进行抽象的一个好处是可以将一些块组合成更大的组件， 这一过程通常是递归的。通过定义代码来按需生成任意复杂度的块， 我们可以通过简洁的代码实现复杂的神经网络。    \n",
    "![1.png](https://image.baidu.com/search/down?url=https://tva1.sinaimg.cn/large/005T39qaly1h0677lrvnuj30h006ztac.jpg)    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面的代码生成一个网络，其中包含一个具有256个单元和ReLU激活函数的全连接隐藏层， 然后是一个具有10个隐藏单元且不带激活函数的全连接输出层。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[ 0.1972,  0.1156,  0.2277,  0.0083,  0.1432,  0.1982,  0.1622,  0.0209,\n",
       "         -0.0954, -0.0444],\n",
       "        [ 0.3239,  0.1790,  0.1620, -0.0362,  0.0556,  0.1570,  0.1533,  0.1342,\n",
       "         -0.2297,  0.1004]], grad_fn=<AddmmBackward0>)"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import torch\n",
    "from torch import nn\n",
    "from torch.nn import functional as F\n",
    "net = nn.Sequential(nn.Linear(20,256),nn.ReLU(),nn.Linear(256,10))\n",
    "\n",
    "X = torch.rand(2,20)\n",
    "net(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " * 层的执行顺序是作为参数传递的。 \n",
    " * 简而言之，nn.Sequential定义了一种特殊的Module， 即在PyTorch中表示一个块的类， 它维护了一个由Module组成的有序列表。 \n",
    " * 注意，两个全连接层都是Linear类的实例， Linear类本身就是Module的子类。 \n",
    " * 另外，到目前为止，我们一直在通过net(X)调用我们的模型来获得模型的输出。 这实际上是net.__call__(X)的简写。 \n",
    " * 这个前向传播函数非常简单： 它将列表中的每个块连接在一起，将每个块的输出作为下一个块的输入。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "自定义块\n",
    "* 将输入数据作为其前向传播函数的参数。\n",
    "* 通过前向传播函数来生成输出。请注意，输出的形状可能与输入的形状不同。例如，我们上面模型中的第一个全连接的层接收一个20维的输入，但是返回一个维度为256的输出。\n",
    "* 计算其输出关于输入的梯度，可通过其反向传播函数进行访问。通常这是自动发生的。\n",
    "* 存储和访问前向传播计算所需的参数。\n",
    "* 根据需要初始化模型参数。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从零开始编写一个块，它包含一个多层感知机，其具有256个隐藏单元的隐藏层和一个10维输出层。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MLP(nn.Module):\n",
    "    # 用模型参数声明层，这里，我们声明两个全连接层\n",
    "    def __init__(self):\n",
    "        # 调用MLP的父类Module的构造函数来执行必要的初始化\n",
    "        # 这样，在类实例化时也可以指定其他函数参数\n",
    "        super().__init__()\n",
    "        self.hidden = nn.Linear(20,256)\n",
    "        self.out = nn.Linear(256,10)\n",
    "    # 定义模型的前向传播，即如何根据输入X返回所需的模型输出\n",
    "    def forward(self,X):\n",
    "        # 注意，这里我们使用ReLU的函数版本，其在nn.functional模块中定义\n",
    "        return self.out(F.relu(self.hidden(X)))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 首先看一下前向传播函数，它以X作为输入， 计算带有激活函数的隐藏表示，并输出其未规范化的输出值。\n",
    "* 实例化多层感知机的层，然后在每次调用前向传播函数时调用这些层。\n",
    "    * 首先，我们定制的__init__函数通过super().__init__() 调用父类的__init__函数， 省去了重复编写模版代码的痛苦。 \n",
    "    * 然后，我们实例化两个全连接层， 分别为self.hidden和self.out。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[-0.1407,  0.1527, -0.1043, -0.0160,  0.0655, -0.1556,  0.1264,  0.0120,\n",
       "          0.1021,  0.3019],\n",
       "        [-0.1537,  0.1940, -0.2014, -0.1459,  0.0060, -0.1113,  0.1260,  0.0605,\n",
       "         -0.1013,  0.1483]], grad_fn=<AddmmBackward0>)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net = MLP()\n",
    "net(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "块的一个主要优点是它的多功能性。 我们可以子类化块以创建层（如全连接层的类）、 整个模型（如上面的MLP类）或具有中等复杂度的各种组件。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "顺序块\n",
    "* 一种将块逐个追加到列表中的函数。\n",
    "* 一种前向传播函数，用于将输入按追加块的顺序传递给块组成的“链条”。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MySequential(nn.Module):\n",
    "    def __init__(self,*args):\n",
    "        super().__init__()\n",
    "        for idx,module in enumerate(args):\n",
    "            # 这里，module是Module子类的一个实例，我们把它保存在Module类的成员\n",
    "            # 变量_modules中，module的类型是OrderedDict\n",
    "            self._modules[str(idx)] = module\n",
    "    def forward(self,X):\n",
    "        # OrderedDict保证了按照成员添加的顺序遍历它们\n",
    "        for block in self._modules.values():\n",
    "            X = block(X)\n",
    "        return X"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* __init__函数将每个模块逐个添加到有序字典_modules中。\n",
    "* _modules的主要优点是： 在模块的参数初始化过程中， 系统知道在_modules字典中查找需要初始化参数的子块。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[ 0.0484,  0.1080,  0.0407, -0.1762,  0.1474, -0.0440, -0.0340, -0.0187,\n",
       "          0.0622, -0.1144],\n",
       "        [ 0.0664,  0.3219,  0.0471, -0.0636,  0.3135, -0.0727, -0.0729, -0.0848,\n",
       "          0.0243, -0.0972]], grad_fn=<AddmmBackward0>)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net = MySequential(nn.Linear(20,256),nn.ReLU(),nn.Linear(256,10))\n",
    "net(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在前向传播函数中执行代码\n",
    "* 当需要更强的灵活性时，我们需要定义自己的块。\n",
    "* 我们可能希望执行任意的数学运算， 而不是简单地依赖预定义的神经网络层。\n",
    "* 有时我们可能希望合并既不是上一层的结果也不是可更新参数的项， 我们称之为常数参数（constant parameter）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们需要一个计算函数  f(x,w)=c⋅w^⊤x 的层， 其中 x 是输入，  w 是参数，  c 是某个在优化过程中没有更新的指定常量。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "class FixedHiddenMLP(nn.Module):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "        # 不计算梯度的随机权重参数，在训练期间保持不变\n",
    "        self.rand_weight = torch.rand((20,20),requires_grad=False)\n",
    "        self.linear = nn.Linear(20,20)\n",
    "    \n",
    "    def forward(self,X):\n",
    "        X = self.linear(X)\n",
    "        # 使用创建的常量参数以及relu和mm函数\n",
    "        X = F.relu(torch.mm(X,self.rand_weight)+1)\n",
    "        # 复用全连接层，相当于两个全连接层共享参数\n",
    "        X = self.linear(X)\n",
    "        # 控制流\n",
    "        while X.abs().sum() > 1:\n",
    "            X /= 2\n",
    "        return X.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在这个FixedHiddenMLP模型中，我们实现了一个隐藏层， 其权重（self.rand_weight）在实例化时被随机初始化，之后为常量。 这个权重不是一个模型参数，因此它永远不会被反向传播更新。 然后，神经网络将这个固定层的输出通过一个全连接层。   \n",
    "代码执行了一个while循环，在 L1 范数大于 1 的条件下， 将输出向量除以 2 ，直到它满足条件为止。 最后，模型返回了X中所有项的和。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor(0.1392, grad_fn=<SumBackward0>)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net = FixedHiddenMLP()\n",
    "net(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以混合搭配各种组合块的方法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor(0.2140, grad_fn=<SumBackward0>)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "class NestMLP(nn.Module):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "        self.net = nn.Sequential(nn.Linear(20,64),nn.ReLU(),\n",
    "                                nn.Linear(64,32),nn.ReLU())\n",
    "        self.linear = nn.Linear(32,16)\n",
    "    def forward(self,X):\n",
    "        return self.linear(self.net(X))\n",
    "\n",
    "chimera = nn.Sequential(NestMLP(),nn.Linear(16,20),FixedHiddenMLP())\n",
    "chimera(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " 在深度学习环境中，我们担心速度极快的GPU可能要等到CPU运行Python代码后才能运行另一个作业。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 一个块可以由许多层组成；一个块可以由许多块组成。\n",
    "* 块可以包含代码。\n",
    "* 块负责大量的内部处理，包括参数初始化和反向传播。\n",
    "* 层和块的顺序连接由Sequential块处理。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 参数管理\n",
    "* 访问参数，用于调试、诊断和可视化。\n",
    "* 参数初始化。\n",
    "* 在不同模型组件间共享参数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[-0.1367],\n",
       "        [-0.3125]], grad_fn=<AddmmBackward0>)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import torch \n",
    "from torch import nn\n",
    "\n",
    "net = nn.Sequential(nn.Linear(4,8),nn.ReLU(),nn.Linear(8,1))\n",
    "X = torch.rand(size=(2,4))\n",
    "net(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "参数访问\n",
    "* 当通过Sequential类定义模型时， 我们可以通过索引来访问模型的任意层。\n",
    "* 这就像模型是一个列表一样，每层的参数都在其属性中。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "查看第二个全连接层的参数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "OrderedDict([('weight', tensor([[ 0.1076,  0.2065, -0.0038, -0.0658,  0.3479,  0.0243, -0.1627,  0.1336]])), ('bias', tensor([-0.1956]))])\n"
     ]
    }
   ],
   "source": [
    "print(net[2].state_dict())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " 首先，这个全连接层包含两个参数，分别是该层的权重和偏置。 两者都存储为单精度浮点数（float32）。         \n",
    " 注意，参数名称允许唯一标识每个参数，即使在包含数百个层的网络中也是如此。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "目标参数\n",
    "* 每个参数都表示为参数类的一个实例。 \n",
    "* 要对参数执行任何操作，首先我们需要访问底层的数值。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面的代码从第二个全连接层（即第三个神经网络层）提取偏置， 提取后返回的是一个参数类实例，并进一步访问该参数的值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'torch.nn.parameter.Parameter'>\n",
      "Parameter containing:\n",
      "tensor([-0.1956], requires_grad=True)\n",
      "tensor([-0.1956])\n"
     ]
    }
   ],
   "source": [
    "print(type(net[2].bias))\n",
    "print(net[2].bias)\n",
    "print(net[2].bias.data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "参数是复合的对象，包含值、梯度和额外信息。      \n",
    "在上面这个网络中，由于我们还没有调用反向传播，所以参数的梯度处于初始状态。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net[2].weight.grad == None"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "一次性访问所有参数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('weight', torch.Size([8, 4])) ('bias', torch.Size([8]))\n",
      "('0.weight', torch.Size([8, 4])) ('0.bias', torch.Size([8])) ('2.weight', torch.Size([1, 8])) ('2.bias', torch.Size([1]))\n"
     ]
    }
   ],
   "source": [
    "print(*[(name,param.shape) for name,param in net[0].named_parameters()])\n",
    "print(*[(name,param.shape) for name,param in net.named_parameters()])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "另一种访问网络参数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([-0.1956])"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net.state_dict()['2.bias'].data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从嵌套块收集参数    \n",
    "* 首先定义一个生成块的函数（可以说是“块工厂”），然后将这些块组合到更大的块中"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[-0.2512],\n",
       "        [-0.2512]], grad_fn=<AddmmBackward0>)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def block1():\n",
    "    return nn.Sequential(nn.Linear(4,8),nn.ReLU(),nn.Linear(8,4),nn.ReLU())\n",
    "\n",
    "def block2():\n",
    "    net = nn.Sequential()\n",
    "    for i in range(4):\n",
    "        # 在这里嵌套\n",
    "        net.add_module(f'block {i}',block1())\n",
    "    return net\n",
    "\n",
    "rgnet = nn.Sequential(block2(),nn.Linear(4,1))\n",
    "rgnet(X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sequential(\n",
      "  (0): Sequential(\n",
      "    (block 0): Sequential(\n",
      "      (0): Linear(in_features=4, out_features=8, bias=True)\n",
      "      (1): ReLU()\n",
      "      (2): Linear(in_features=8, out_features=4, bias=True)\n",
      "      (3): ReLU()\n",
      "    )\n",
      "    (block 1): Sequential(\n",
      "      (0): Linear(in_features=4, out_features=8, bias=True)\n",
      "      (1): ReLU()\n",
      "      (2): Linear(in_features=8, out_features=4, bias=True)\n",
      "      (3): ReLU()\n",
      "    )\n",
      "    (block 2): Sequential(\n",
      "      (0): Linear(in_features=4, out_features=8, bias=True)\n",
      "      (1): ReLU()\n",
      "      (2): Linear(in_features=8, out_features=4, bias=True)\n",
      "      (3): ReLU()\n",
      "    )\n",
      "    (block 3): Sequential(\n",
      "      (0): Linear(in_features=4, out_features=8, bias=True)\n",
      "      (1): ReLU()\n",
      "      (2): Linear(in_features=8, out_features=4, bias=True)\n",
      "      (3): ReLU()\n",
      "    )\n",
      "  )\n",
      "  (1): Linear(in_features=4, out_features=1, bias=True)\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "print(rgnet)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因为层是分层嵌套的，所以我们也可以像通过嵌套列表索引一样访问它们。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([-0.0420, -0.2995, -0.0799, -0.0941, -0.1794,  0.2257,  0.3937,  0.1855])"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rgnet[0][1][0].bias.data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "参数初始化\n",
    "* 深度学习框架提供默认随机初始化， 也允许我们创建自定义初始化方法， 满足我们通过其他规则实现初始化权重。\n",
    "* 默认情况下，PyTorch会根据一个范围均匀地初始化权重和偏置矩阵， 这个范围是根据输入和输出维度计算出的。 \n",
    "* PyTorch的nn.init模块提供了多种预置初始化方法。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将所有权重参数初始化为标准差为0.01的高斯随机变量， 且将偏置参数设置为0。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(tensor([ 0.0140, -0.0007,  0.0046, -0.0131]), tensor(0.))"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def init_normal(m):\n",
    "    if type(m) == nn.Linear:\n",
    "        nn.init.normal_(m.weight,mean=0,std=0.01)\n",
    "        nn.init.zeros_(m.bias)\n",
    "net.apply(init_normal)\n",
    "net[0].weight.data[0],net[0].bias.data[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们还可以将所有参数初始化为给定的常数，比如初始化为1。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(tensor([1., 1., 1., 1.]), tensor(0.))"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def init_constant(m):\n",
    "    if type(m) == nn.Linear:\n",
    "        nn.init.constant_(m.weight,1)\n",
    "        nn.init.zeros_(m.bias)\n",
    "net.apply(init_constant)\n",
    "net[0].weight.data[0],net[0].bias.data[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们还可以对某些块应用不同的初始化方法。        \n",
    "* 下面我们使用Xavier初始化方法初始化第一个神经网络层， 然后将第三个神经网络层初始化为常量值42。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([ 0.4477,  0.6830, -0.4681,  0.6651])\n",
      "tensor([[42., 42., 42., 42., 42., 42., 42., 42.]])\n"
     ]
    }
   ],
   "source": [
    "def xavier(m):\n",
    "    if type(m) == nn.Linear:\n",
    "        nn.init.xavier_uniform_(m.weight)\n",
    "\n",
    "def init_42(m):\n",
    "    if type(m) == nn.Linear:\n",
    "        nn.init.constant_(m.weight,42)\n",
    "\n",
    "net[0].apply(xavier)\n",
    "net[2].apply(init_42)\n",
    "print(net[0].weight.data[0])\n",
    "print(net[2].weight.data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "自定义初始化\n",
    "* 有时，深度学习框架没有提供我们需要的初始化方法.\n",
    "* 使用以下的分布为任意权重参数 w 定义初始化方法：   \n",
    "![1.png](https://image.baidu.com/search/down?url=https://tva1.sinaimg.cn/large/005T39qaly1h07cxo5xdsj308s03474k.jpg)    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "init weight torch.Size([8, 4])\n",
      "init weight torch.Size([1, 8])\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "tensor([[ 0.0000,  0.0000,  9.3864, -7.9540],\n",
       "        [ 0.0000,  9.8976,  5.9511, -0.0000]], grad_fn=<SliceBackward0>)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def my_init(m):\n",
    "    if type(m) == nn.Linear:\n",
    "        print(\"init\",*[(name,param.shape)\n",
    "                        for name,param in m.named_parameters()][0])\n",
    "        nn.init.uniform_(m.weight,-10,10)\n",
    "        m.weight.data *= m.weight.data.abs() >= 5\n",
    "\n",
    "net.apply(my_init)\n",
    "net[0].weight[:2]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以直接设置参数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([ 0.0000,  0.0000,  9.3864, -7.9540])\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "tensor([42.0000,  1.0000, 10.3864, -6.9540])"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(net[0].weight.data[0])\n",
    "net[0].weight.data[:] += 1\n",
    "net[0].weight.data[0,0] = 42\n",
    "net[0].weight.data[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "参数绑定\n",
    "* 在多个层间共享参数：我们可以定义一个稠密层，然后使用它的参数来设置另一个层的参数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([True, True, True, True, True, True, True, True])\n",
      "tensor([True, True, True, True, True, True, True, True])\n"
     ]
    }
   ],
   "source": [
    "# 需要给共享层一个名称，以便可以引用它的参数\n",
    "shared = nn.Linear(8,8)\n",
    "net = nn.Sequential(nn.Linear(4,8),nn.ReLU(),\n",
    "                    shared,nn.ReLU(),\n",
    "                    shared,nn.ReLU(),\n",
    "                    nn.Linear(8,1))\n",
    "net(X)\n",
    "# 检查参数是否相同\n",
    "print(net[2].weight.data[0] == net[4].weight.data[0])\n",
    "net[2].weight.data[0,0] = 10\n",
    "# 确保它们实际上是同一个对象，而不只是有相同的值\n",
    "print(net[2].weight.data[0] == net[4].weight.data[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " 它们不仅值相等，而且由相同的张量表示。  \n",
    " 当参数绑定时，梯度会发生什么情况？ 答案是由于模型参数包含梯度，因此在反向传播期间第二个隐藏层 （即第三个神经网络层）和第三个隐藏层（即第五个神经网络层）的梯度会加在一起。   \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 延后初始化\n",
    "* 延后初始化（defers initialization）， 即直到数据第一次通过模型传递时，框架才会动态地推断出每个层的大小。\n",
    "* 延后初始化使框架能够自动推断参数形状，使修改模型架构变得容易，避免了一些常见的错误。\n",
    "* 我们可以通过模型传递数据，使框架最终初始化参数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "延后初始化： Sequential(\n",
      "  (0): LazyLinear(in_features=0, out_features=256, bias=True)\n",
      "  (1): ReLU()\n",
      "  (2): Linear(in_features=256, out_features=10, bias=True)\n",
      ")\n",
      "数据通过模型后，自动初始化： Sequential(\n",
      "  (0): Linear(in_features=20, out_features=256, bias=True)\n",
      "  (1): ReLU()\n",
      "  (2): Linear(in_features=256, out_features=10, bias=True)\n",
      ")\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "D:\\Anaconda\\envs\\d2l\\lib\\site-packages\\torch\\nn\\modules\\lazy.py:178: UserWarning: Lazy modules are a new feature under heavy development so changes to the API or functionality can happen at any moment.\n",
      "  warnings.warn('Lazy modules are a new feature under heavy development '\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "from torch import nn\n",
    "net = nn.Sequential(nn.LazyLinear(256), nn.ReLU(),nn.Linear(256,10))\n",
    "print(\"延后初始化：\",net)\n",
    "[net[i].state_dict() for i in range(len(net))]\n",
    "low = torch.finfo(torch.float32).min/10\n",
    "high = torch.finfo(torch.float32).max/10\n",
    "X = torch.zeros([2,20],dtype=torch.float32).uniform_(low, high)\n",
    "net(X)\n",
    "print(\"数据通过模型后，自动初始化：\",net)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 自定义层\n",
    "我们可以用创造性的方式组合不同的层，从而设计出适用于各种任务的架构。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "不带参数的层"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn.functional as F\n",
    "from torch import nn\n",
    "\n",
    "class CenteredLayer(nn.Module):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "\n",
    "    def forward(self,X):\n",
    "        return X - X.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "让我们向该层提供一些数据，验证它是否能按预期工作。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([-2., -1.,  0.,  1.,  2.])"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "layer = CenteredLayer()\n",
    "layer(torch.FloatTensor([1,2,3,4,5]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将层作为组件合并到更复杂的模型中"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "net = nn.Sequential(nn.Linear(8,128),CenteredLayer())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "作为额外的健全性检查，我们可以在向该网络发送随机数据后，检查均值是否为0。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor(3.7253e-09, grad_fn=<MeanBackward0>)"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Y = net(torch.rand(4,8))\n",
    "Y.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "带参数的层\n",
    "* 使用内置函数来创建参数，这些函数提供一些基本的管理功能。 比如管理访问、初始化、共享、保存和加载模型参数。\n",
    "* 这样，我们不需要为每个自定义层编写自定义的序列化程序。\n",
    "* 该层需要两个参数，一个用于表示权重，另一个用于表示偏置项。 \n",
    "* 使用修正线性单元作为激活函数。 该层需要输入参数：in_units和units，分别表示输入数和输出数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MyLinear(nn.Module):\n",
    "    def __init__(self,in_units,units):\n",
    "        super().__init__()\n",
    "        self.weight = nn.Parameter(torch.randn(in_units,units))\n",
    "        self.bias = nn.Parameter(torch.randn(units,))\n",
    "\n",
    "    def forward(self,X):\n",
    "        linear = torch.matmul(X,self.weight.data) + self.bias.data\n",
    "        return F.relu(linear)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "实例化MyLinear类并访问其模型参数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Parameter containing:\n",
       "tensor([[-0.3054, -0.0129,  0.5341],\n",
       "        [-1.2062, -0.5955,  0.0484],\n",
       "        [ 0.3001, -0.5214, -1.0337],\n",
       "        [-1.4682,  0.1071,  0.3719],\n",
       "        [-0.6779, -1.7879,  0.0868]], requires_grad=True)"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "linear = MyLinear(5,3)\n",
    "linear.weight"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用自定义层直接执行前向传播计算"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[0.0094, 0.0000, 0.0000],\n",
       "        [0.0000, 0.0000, 0.0000]])"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "linear(torch.rand(2,5))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以使用自定义层构建模型，就像使用内置的全连接层一样使用自定义层"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[0.8648],\n",
       "        [0.0000]])"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net = nn.Sequential(MyLinear(64,8),MyLinear(8,1))\n",
    "net(torch.rand(2,64))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 小结\n",
    "* 我们可以通过基本层类设计自定义层。这允许我们定义灵活的新层，其行为与深度学习框架中的任何现有层不同。\n",
    "* 在自定义层定义完成后，我们就可以在任意环境和网络架构中调用该自定义层。\n",
    "* 层可以有局部参数，这些参数可以通过内置函数创建。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 读写文件\n",
    "当运行一个耗时较长的训练过程时， 最佳的做法是定期保存中间结果， 以确保在服务器电源被不小心断掉时，我们不会损失几天的计算结果。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "加载和保存张量\n",
    "* 对于单个张量，我们可以直接调用load和save函数分别读写它们。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch \n",
    "from torch import nn\n",
    "from torch.nn import functional as F\n",
    "\n",
    "x = torch.arange(4)\n",
    "torch.save(x,'x-file')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将存储在文件中的数据读回内存"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([0, 1, 2, 3])"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x2 = torch.load('x-file')\n",
    "x2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "存储一个张量列表，然后把它们读回内存"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(tensor([0, 1, 2, 3]), tensor([0., 0., 0., 0.]))"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y = torch.zeros(4)\n",
    "torch.save([x,y],'x-files')\n",
    "x2,y2 = torch.load('x-files')\n",
    "(x2,y2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "写入或读取从字符串映射到张量的字典"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'x': tensor([0, 1, 2, 3]), 'y': tensor([0., 0., 0., 0.])}"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mydict = {'x':x,'y':y}\n",
    "torch.save(mydict,'mydict')\n",
    "mydict2 = torch.load('mydict')\n",
    "mydict2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "加载和保存模型参数\n",
    "* 深度学习框架提供了内置函数来保存和加载整个网络,这将保存模型的参数而不是保存整个模型。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MLP(nn.Module):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "        self.hidden = nn.Linear(20,256)\n",
    "        self.output = nn.Linear(256,10)\n",
    "    \n",
    "    def forward(self,x):\n",
    "        return self.output(F.relu(self.hidden(x)))\n",
    "\n",
    "net = MLP()\n",
    "X = torch.randn(size=(2,20))\n",
    "Y = net(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将模型的参数存储在一个叫做“mlp.params”的文件中"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "torch.save(net.state_dict(),'mlp.params')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为了恢复模型，我们实例化了原始多层感知机模型的一个备份。        \n",
    "这里我们不需要随机初始化模型参数，而是直接读取文件中存储的参数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MLP(\n",
       "  (hidden): Linear(in_features=20, out_features=256, bias=True)\n",
       "  (output): Linear(in_features=256, out_features=10, bias=True)\n",
       ")"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "clone = MLP()\n",
    "clone.load_state_dict(torch.load('mlp.params'))\n",
    "clone.eval()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "验证结果：由于两个实例具有相同的模型参数，在输入相同的X时， 两个实例的计算结果应该相同。 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[True, True, True, True, True, True, True, True, True, True],\n",
       "        [True, True, True, True, True, True, True, True, True, True]])"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Y_clone = clone(X)\n",
    "Y_clone == Y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 小结\n",
    "* save和load函数可用于张量对象的文件读写。\n",
    "* 我们可以通过参数字典保存和加载网络的全部参数。\n",
    "* 保存架构必须在代码中完成，而不是在参数中完成"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### GPU"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用nvidia-smi命令来查看显卡信息(语句前加!表示执行命令行)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sat Aug 06 17:25:23 2022       \n",
      "+-----------------------------------------------------------------------------+\n",
      "| NVIDIA-SMI 456.71       Driver Version: 456.71       CUDA Version: 11.1     |\n",
      "|-------------------------------+----------------------+----------------------+\n",
      "| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
      "| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n",
      "|===============================+======================+======================|\n",
      "|   0  GeForce RTX 2070   WDDM  | 00000000:01:00.0  On |                  N/A |\n",
      "|  0%   54C    P8    24W / 175W |   1017MiB /  8192MiB |      0%      Default |\n",
      "+-------------------------------+----------------------+----------------------+\n",
      "                                                                               \n",
      "+-----------------------------------------------------------------------------+\n",
      "| Processes:                                                                  |\n",
      "|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\n",
      "|        ID   ID                                                   Usage      |\n",
      "|=============================================================================|\n",
      "|    0   N/A  N/A      1188    C+G   Insufficient Permissions        N/A      |\n",
      "|    0   N/A  N/A      2548    C+G   C:\\Windows\\explorer.exe         N/A      |\n",
      "|    0   N/A  N/A      7392    C+G   ...bbwe\\Microsoft.Photos.exe    N/A      |\n",
      "|    0   N/A  N/A      7448    C+G   ...ekyb3d8bbwe\\HxOutlook.exe    N/A      |\n",
      "|    0   N/A  N/A     10552    C+G   ...y\\ShellExperienceHost.exe    N/A      |\n",
      "|    0   N/A  N/A     10912    C+G   ...lPanel\\SystemSettings.exe    N/A      |\n",
      "|    0   N/A  N/A     11520    C+G   ...e\\PhoneExperienceHost.exe    N/A      |\n",
      "|    0   N/A  N/A     12436    C+G   ...2txyewy\\TextInputHost.exe    N/A      |\n",
      "|    0   N/A  N/A     12636    C+G   ...in7x64\\steamwebhelper.exe    N/A      |\n",
      "|    0   N/A  N/A     14260    C+G   ...icrosoft VS Code\\Code.exe    N/A      |\n",
      "|    0   N/A  N/A     14672    C+G   ...7pnf6hceqser\\snipaste.exe    N/A      |\n",
      "|    0   N/A  N/A     15680    C+G   E:\\Zotero\\zotero.exe            N/A      |\n",
      "|    0   N/A  N/A     17252    C+G   ...8wekyb3d8bbwe\\Cortana.exe    N/A      |\n",
      "|    0   N/A  N/A     17904    C+G   ...kyb3d8bbwe\\Calculator.exe    N/A      |\n",
      "|    0   N/A  N/A     18500    C+G   ...me\\Application\\chrome.exe    N/A      |\n",
      "|    0   N/A  N/A     18880    C+G   ...3\\jbr\\bin\\jcef_helper.exe    N/A      |\n",
      "|    0   N/A  N/A     19052    C+G   ...5n1h2txyewy\\SearchApp.exe    N/A      |\n",
      "|    0   N/A  N/A     20144    C+G   ...ge\\Application\\msedge.exe    N/A      |\n",
      "|    0   N/A  N/A     20200    C+G   ...5n1h2txyewy\\SearchApp.exe    N/A      |\n",
      "+-----------------------------------------------------------------------------+\n"
     ]
    }
   ],
   "source": [
    "!nvidia-smi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " 通过智能地将数组分配给上下文， 我们可以最大限度地减少在设备之间传输数据的时间"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "计算设备\n",
    "* 默认情况下，张量是在内存中创建的，然后使用CPU计算它。\n",
    "* 在PyTorch中，CPU和GPU可以用torch.device('cpu') 和torch.device('cuda')表示。\n",
    "* cpu设备意味着所有物理CPU和内存， 这意味着PyTorch的计算将尝试使用所有CPU核心。 然而，gpu设备只代表一个卡和相应的显存。 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(device(type='cpu'), device(type='cuda'), device(type='cuda', index=1))"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import torch \n",
    "from torch import nn\n",
    "\n",
    "torch.device(type='cpu'),torch.device('cuda'),torch.device('cuda:1')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "查询可用gpu的数量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "torch.cuda.device_count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "两个函数允许我们在不存在所需所有GPU的情况"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(device(type='cuda', index=0),\n",
       " device(type='cpu'),\n",
       " [device(type='cuda', index=0)])"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def try_gpu(i=0):  #@save\n",
    "    \"\"\"如果存在，则返回gpu(i)，否则返回cpu()\"\"\"\n",
    "    if torch.cuda.device_count() >= i + 1:\n",
    "        return torch.device(f'cuda:{i}')\n",
    "    return torch.device('cpu')\n",
    "\n",
    "def try_all_gpus():  #@save\n",
    "    \"\"\"返回所有可用的GPU，如果没有GPU，则返回[cpu(),]\"\"\"\n",
    "    devices = [torch.device(f'cuda:{i}')\n",
    "             for i in range(torch.cuda.device_count())]\n",
    "    return devices if devices else [torch.device('cpu')]\n",
    "\n",
    "try_gpu(), try_gpu(10), try_all_gpus()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 张量与GPU\n",
    "查询张量所在的设备,默认情况下，张量是在CPU上创建的"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "device(type='cpu')"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = torch.tensor([1,2,3])\n",
    "x.device"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "无论何时我们要对多个项进行操作， 它们都必须在同一个设备上。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " * 在GPU上创建的张量只消耗这个GPU的显存。 \n",
    " * 我们可以使用nvidia-smi命令查看显存使用情况。 一般来说，我们需要确保不创建超过GPU显存限制的数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[1., 1., 1.],\n",
       "        [1., 1., 1.]], device='cuda:0')"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X = torch.ones(2,3,device=try_gpu())\n",
    "X"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "复制\n",
    "* 我们可以将X传输到第二个GPU并在那里执行操作。 不要简单地X加上Y，因为这会导致异常。\n",
    "* 由于Y位于第二个GPU上，所以我们需要将X移到那里， 然后才能执行相加运算。    \n",
    "![1.png](https://image.baidu.com/search/down?url=https://tva1.sinaimg.cn/large/005T39qaly1h08cftviz4j30d806nmxv.jpg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[1., 1., 1.],\n",
      "        [1., 1., 1.]], device='cuda:0')\n",
      "tensor([[1., 1., 1.],\n",
      "        [1., 1., 1.]], device='cuda:0')\n"
     ]
    }
   ],
   "source": [
    "Z = X.cuda(0)\n",
    "print(X)\n",
    "print(Z)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "多个小操作比一个大操作糟糕得多。    \n",
    " 如果数据不在内存中，框架会首先将其复制到内存中， 这会导致额外的传输开销。  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将模型参数放在GPU上"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [],
   "source": [
    "net = nn.Sequential(nn.Linear(3,1))\n",
    "net = net.to(device=try_gpu())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当输入为GPU上的张量时，模型将在同一GPU上计算结果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[-0.4974],\n",
       "        [-0.4974]], device='cuda:0', grad_fn=<AddmmBackward0>)"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "让我们确认模型参数存储在同一个GPU上。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "device(type='cuda', index=0)"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net[0].weight.data.device"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "小结\n",
    "* 我们可以指定用于存储和计算的设备，例如CPU或GPU。默认情况下，数据在主内存中创建，然后使用CPU进行计算。\n",
    "* 深度学习框架要求计算的所有输入数据都在同一设备上，无论是CPU还是GPU。\n",
    "* 不经意地移动数据可能会显著降低性能。一个典型的错误如下：计算GPU上每个小批量的损失，并在命令行中将其报告给用户（或将其记录在NumPy ndarray中）时，将触发全局解释器锁，从而使所有GPU阻塞。最好是为GPU内部的日志分配内存，并且只移动较大的日志。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.9.13 ('pytorch')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.0"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "0747f93ff6db21b2db2bf35ad4858dd0825b9c21797c41b4cc32097944ab3f10"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
